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Abstract 

In  this  paper  linear  canonical  correlation  analysis  (LCCA)  is  generalized  by  applying  a  structured 
transform  to  the  joint  probability  distribution  of  the  considered  pair  of  random  vectors,  i.e.,  a  trans¬ 
formation  of  the  joint  probability  measure  defined  on  their  joint  observation  space.  This  framework, 
called  measure  transformed  canonical  correlation  analysis  (MTCCA),  applies  LCCA  to  the  data  after 
transformation  of  the  joint  probability  measure.  We  show  that  judicious  choice  of  the  transform  leads  to 
a  modified  canonical  correlation  analysis,  which,  in  contrast  to  LCCA,  is  capable  of  detecting  non-linear 
relationships  between  the  considered  pair  of  random  vectors.  Unlike  kernel  canonical  correlation  analysis, 
where  the  transformation  is  applied  to  the  random  vectors,  in  MTCCA  the  transformation  is  applied  to 
their  joint  probability  distribution.  This  results  in  performance  advantages  and  reduced  implementation 
complexity.  The  proposed  approach  is  illustrated  for  graphical  model  selection  in  simulated  data  having 
non-linear  dependencies,  and  for  measuring  long-term  associations  between  companies  traded  in  the 
NASDAQ  and  NYSE  stock  markets. 


Index  Terms 

Association  analysis,  canonical  correlation  analysis,  graphical  model  selection,  multivariate  data 
analysis,  probability  measure  transform. 


I.  Introduction 

Linear  canonical  correlation  analysis  (LCCA)  [1]  is  a  technique  for  multivariate  data  analysis  and 
dimensionality  reduction,  which  quantifies  the  linear  associations  between  a  pair  of  random  vectors.  In 
particular,  LCCA  generates  a  sequence  of  pairwise  unit  variance  linear  combinations  of  the  considered 
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random  vectors,  such  that  the  Pearson  correlation  coefficient  between  the  elements  of  each  pair  is  maximal, 
and  each  pair  is  uncorrelated  with  its  predecessors.  The  coefficients  of  these  linear  combinations,  called  the 
linear  canonical  directions,  give  insight  into  the  underlying  relationships  between  the  random  vectors.  They 
are  easily  obtained  by  solving  a  simple  generalized  eigenvalue  decomposition  (GEVD)  problem,  which 
only  involves  the  covariance  and  cross-covariance  matrices  of  the  considered  random  vectors.  LCCA  has 
been  applied  to  blind  source  separation  [3],  image  set  matching  [4],  direction-of-arrival  estimation  [5], 
[6],  data  fusion  and  group  inference  in  medical  imaging  data  [7],  localization  of  visual  events  associated 
with  sound  sources  [8],  audio-video  synchronization  [9],  undersea  target  classification  [10]  among  others. 

The  Pearson  correlation  coefficient  is  only  sensitive  to  linear  associations  between  random  variables. 
Therefore,  in  cases  where  the  considered  random  vectors  are  statistically  dependent  yet  uncorrelated, 
LCCA  is  not  an  informative  tool. 

In  order  to  overcome  the  linear  dependence  limitation  several  generalizations  of  LCCA  have  been 
proposed  in  the  literature.  In  [11]  an  information-theoretic  approach  to  canonical  correlation  analysis, 
called  ICCA,  was  proposed.  This  method  generates  a  sequence  pairwise  unit  variance  linear  combinations 
of  the  considered  random  vectors,  such  that  the  mutual-information  (MI)  [12]  between  the  elements  of 
each  pair  is  maximal,  and  each  pair  is  uncorrelated  with  its  predecessors.  Since  the  MI  is  a  general 
measure  of  statistical  dependence,  which  is  sensitive  to  non-linear  relationships,  the  ICCA  [11]  is  capable 
of  capturing  pairs  of  linear  combinations  exhibiting  non-linear  dependence.  However,  in  contrast  to  LCCA, 
the  ICCA  does  not  reduce  to  a  simple  GEVD  problem.  Indeed,  in  [11]  each  pair  of  linear  combinations 
must  be  obtained  separately  via  an  iterative  Newton-Raphson  [13]  algorithm,  which  may  converge  to 
undesired  local  maxima.  Moreover  each  step  of  the  Newton-Raphson  algorithm  involves  re-estimation  of 
the  MI  in  a  non-parametric  manner  at  a  potentially  high  computational  cost. 

Another  approach  to  non-linear  generalization  of  LCCA  is  kernel  canonical  correlation  analysis  (KCCA) 
[  1 4]  -  [  1 6] .  KCCA  applies  LCCA  to  high-dimensional  non-linear  transformations  of  the  considered  random 
vectors  that  map  them  into  some  reproducing  kernel  Hilbert  spaces.  Although  the  KCCA  approach  can  be 
successful  in  extracting  non-linear  relations  [16],  [17]-[19],  it  suffers  from  the  following  drawbacks.  Lirst, 
the  high-dimensional  mappings  may  have  high  computational  complexity.  Second,  the  method  is  highly 
prone  to  over-fitting  errors,  and  requires  regularization  of  the  covariance  matrices  of  the  transformed 
random  vectors  to  increase  numerical  stability.  Linally,  the  non-linear  mappings  of  the  random  vectors 
may  mask  the  dependencies  between  their  original  coordinates. 

In  this  paper  we  propose  a  different  non-linear  generalization  of  LCCA  called  measure  transformed 
canonical  correlation  analysis  (MTCCA).  We  apply  a  structured  transform  to  the  joint  probability  distri- 
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bution  of  the  considered  pair  of  random  vectors,  i.e.,  a  transformation  of  the  joint  probability  measure 
defined  on  their  joint  observation  space.  The  proposed  transform  is  structured  by  a  pair  of  non-negative 
functions  called  the  MT-functions.  It  preserves  statistical  independence  and  maps  the  joint  probability 
distribution  into  a  set  of  probability  measures  on  the  joint  observation  space.  By  modifying  the  MT- 
functions  classes  of  measure  transformations  can  be  obtained  that  have  different  properties.  Two  types  of 
MT-functions,  the  exponential  and  the  Gaussian,  arc  developed  in  this  paper.  The  former  has  a  translation 
invariance  property  while  the  latter  has  a  localization  property. 

MTCCA  applies  LCCA  to  the  considered  pair  of  random  vectors  under  the  proposed  probability 
measure  transform.  By  modifying  the  MT-functions  the  correlation  coefficient  under  the  transformed 
probability  measure,  called  the  MT-correlation  coefficient,  is  modified,  resulting  in  a  new  general  frame¬ 
work  for  canonical  correlation  analysis.  In  MTCCA,  the  MT-correlation  coefficients  between  the  elements 
of  each  generated  pair  of  linear-  combinations  are  called  the  MT-canonical  correlation  coefficients. 

The  MT-functions  are  selected  from  exponential  and  Gaussian  families  of  functions  parameterized  by 
scale  and  location  parameters.  Under  these  function  classes  it  is  shown  that  pairs  of  linear-  combinations 
with  non-linear-  dependence  can  be  detected  by  MTCCA.  The  parameters  of  the  MT-functions  are  selected 
via  maximization  of  a  lower  bound  on  the  largest  MT-canonical  correlation  coefficient.  We  show  that, 
for  these  selected  parameters,  the  corresponding  largest  MT-canonical  correlation  coefficient  constitutes 
a  measure  for  statistical  independence  under  the  original  probability  distribution.  In  this  case  it  is  also 
shown  that  the  considered  random  vectors  are  statistically  independent  under  both  transformed  and  original 
probability  distributions  if  and  only  if  they  are  uncorrelated  under  the  transformed  probability  distribution. 

In  the  paper  an  empirical  implementation  of  MTCCA  is  proposed  that  uses  strongly  consistent  esti¬ 
mators  of  the  measure  transformed  covariance  and  cross-covariance  matrices  of  the  considered  random 
vectors. 

The  MTCCA  approach  has  the  following  advantages  over  LCCA,  ICCA,  and  the  KCCA  discussed 
above:  1)  In  contrast  to  LCCA,  MTCCA  is  capable  of  detecting  non-linear  dependencies.  Moreover, 
under  appropriate  selection  of  the  MT-functions,  the  largest  MT-canonical  correlation  coefficient  is  a 
measure  of  statistical  independence  between  the  considered  random  vectors.  2)  In  comparison  to  the  ICCA, 
MTCCA  is  easier  to  implement  from  the  following  reasons.  First,  it  reduces  to  a  simple  GEVD  problem, 
which  only  involves  the  measure  transformed  covariance  and  cross-covariance  matrices  of  the  considered 
random  vectors.  Second,  while  MTCCA  with  exponential  and  Gaussian  MT-functions  involves  a  single 
maximization  for  choosing  the  MT-functions  parameters,  the  ICCA  involves  a  sequence  of  maximization 
problems,  each  having  the  same  dimensionality  as  in  MTCCA.  3)  In  the  paper  we  show  that  unlike 
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the  empirical  ICCA  and  KCCA,  the  computational  complexity  of  the  empirical  MTCCA  is  linear  in  the 
sample  size  which  makes  it  favorable  in  large  sample  size  scenarios.  4)  Unlike  KCCA,  MTCCA  does 
not  expand  the  dimensions  of  the  random  vectors,  nor  does  it  require  regularization  of  their  measure 
transformed  covariance  matrices.  5)  Finally,  unlike  KCCA,  in  MTCCA  the  original  coordinates  of  the 
observation  vectors  are  retained  after  the  probability  measure  transform.  Therefore,  MTCCA  can  be  easily 
applied  to  variable  selection  [20]  by  discarding  a  subset  of  the  variables  for  which  the  corresponding 
entries  of  the  measure  transformed  canonical  directions  arc  practically  zero. 

The  proposed  approach  is  illustrated  for  two  applications.  The  first  is  a  simulation  of  graphical  models 
with  known  dependency  structure.  In  this  simulated  example  we  show  that  in  similar  to  ICCA,  the 
MTCCA  outperforms  the  LCCA  in  selecting  valid  linear/non-linear  graphical  model  topology.  The  second 
application  is  construction  of  networks  that  analyze  long-term  associations  between  companies  traded  in 
the  NASDAQ  and  NYSE  stock  markets.  We  show  that  MTCCA  and  KCCA  better  associate  companies 
in  the  same  sector  (technology,  pharmaceutical,  financial)  than  does  LCCA  and  ICCA.  Furthermore, 
MTCCA  is  able  to  achieve  this  by  finding  strong  non-linear  dependencies  between  the  daily  log-returns 
of  these  companies. 

The  paper  is  organized  as  follows.  In  Section  II,  LCCA  is  reviewed.  In  Section  III,  LCCA  is  generalized 
by  applying  a  transform  to  the  joint  probability  distribution.  Selection  of  the  MT-functions  associated 
with  the  transform  is  discussed  in  Section  IV.  In  Section  V,  empirical  implementation  of  MTCCA  is 
obtained.  In  Section  VI,  the  proposed  approach  is  illustrated  via  simulation  experiment.  In  Section  VII, 
the  main  points  of  this  contribution  are  summarized.  The  propositions  and  theorems  stated  throughout 
the  paper  are  proved  in  the  Appendix. 


II.  Linear  canonical  correlation  analysis:  Review 


A.  Preliminaries 

Let  X  and  Y  denote  two  random  vectors,  whose  observation  spaces  are  given  by  T  C  Kp  and  y  C  W1, 
respectively.  We  define  the  measure  space  (X  x  y,Sxxy,  /Ay),  where  SXXy  is  a  cr-algebra  over  A  x  y, 
and  PXY  is  the  joint  probability  measure  on  Sxxy.  The  marginal  probability  measures  of  I\Y  on  Sx 
and  Sy  are  denoted  by  Px  and  PY,  where  Sx  and  Sy  are  the  er-algebras  over  X  and  y,  respectively.  Let 
g  (•,  •)  denote  an  integrable  scalar  function  on  X  x  y.  The  expectation  of  g  (X,  Y)  under  PXY  is  defined 
as 


(1) 
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where  x  £  X  and  y  G  y.  The  random  vectors  X  and  Y  will  be  said  to  be  statistically  independent  under 

PXY  if 

E  [9l  (X)  g2  (Y) ;  PXY]  =  E  [gi  (X) ;  Px]  E  [g2  (Y) ;  PY]  (2) 

for  all  integrable  scalar  functions  <7i  (•)>  92  (•)  on  Y  and  y,  respectively.  The  random  vectors  X  and  Y 
will  be  said  to  be  uncorrelated  under  PXY  if 

E  [XYt;  Pxy]  =  E  [X;  Px]  E  [Yt;  Py]  ,  (3) 

where  (• )T  denotes  the  transpose  operator. 


B.  The  LCCA  procedure 

LCCA  generates  a  sequence  of  pairwise  unit-variance  linear  combinations  (a^X,  b^Y ),  k  =  1, . . .  ,r  = 
min  ( p ,  q)  in  the  following  manner.  The  first  pair  (a(X.  b7  Y)  is  determined  by  maximizing  the  Pearson 
correlation  coefficient  between  a7  X  and  b7  Y  over  a  £  IRC  and  b  £  K!  with  the  constraint  that  both 
a7X  and  bTY  have  unit  variance.  Similarly,  the  A-th  pair  (a7X.  b^Y)  (1  <  A  <  r )  is  determined  by 
maximizing  the  Pearson  correlation  coefficient  between  a7  X  and  bT  Y  over  a  £  IRC  and  b  £  M9  with  the 
constraints  that  both  a7  X  and  b7  Y  have  unit  variance  and  (a7X,b7Y)  are  uncorrelated  with  all  the 
previously  obtained  pairs  (a 7  X.  b7  Y),  l  =  1 , . . . ,  k  —  1.  The  pairs  (a/.. ,  b/.)  and  (a7  X,  b7  Y )  are  called 
the  fc-th  order  linear  canonical  directions  and  the  A-th  order  linear  canonical  variates,  respectively. 
The  Pearson  correlation  coefficient  between  a  j.  X  and  b7  Y  is  called  the  A-th  order  linear  canonical 
correlation  coefficient. 

The  Pearson  conelation  coefficient  between  a7  X  and  b7  Y  under  PXY  is  given  by 


Corr  [aTX,  bTY;  PXY]  = 


Cov  [aTX,  bTY;  Px 


a7  SXYb 


(4) 


\/Var  [a^X;  Px]  ^/Var  [bTY;  PY] 
where  Var  [•;  Px]  and  Cov  PXY]  denote  the  variance  and  covariance  under  Px  and  PXY,  respectively. 
The  last  equality  in  (4)  can  be  easily  verified  using  the  basic  definitions  of  variance  and  covariance, 
where  Sx  £  Wrxp,  XY  <E  WlXq  and  SXY  G  W>xq  denote  the  covariance  matrix  of  X  under  Px,  the 
covariance  matrix  of  Y  under  PY,  and  their  cross-covariance  matrix  under  PXY,  respectively,  and  it  is 
assumed  that  Sx  and  XY  are  non-singular. 

Hence,  LCCA  solves  the  following  constraint  maximization  sequentially  over  k  =  1 , . . . ,  r. 


Pk  (^x!  ^y;  SXY)  —  max a7SXYb,  (5) 

a,b 

s.t.  aTSxa  =  bTSYb  =  1, 

and  aTEXYb;  =  bTSXYa/  =  a7  Xxa;  =  b7  XYb i  =  0  VI  <  l  <  k, 
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where  pk  (£x .  SY,  SXY)  denotes  the  /,:-th  order  linear  canonical  correlation  coefficient.  Since  the  number 
of  constraints  in  (5)  increases  with  k,  it  is  implied  that  the  linear  canonical  correlation  coefficients  satisfy 
the  following  order  relation  1  >  pi  (Sx,  XY,  SXY)  >  . . .  >  pr  (Sx,  XY,  SXY)  >  0. 

It  is  well  known  that  the  constrained  maximization  problem  in  (5)  reduces  to  the  set  of  r  distinct 
solutions  of  the  following  generalized  eigenvalue  problem  [21] 


0  SXY 

a 

=  P 

£x 

0 

a 

sL  o 

b 

0 

SY  _ 

b 

where  p  =  pk  (£x .  XY,  SXY)  is  the  A:-th  largest  generalized  eigenvalue  of  the  pencil  in  (6),  and 
[ar,bT]T  =  [a^ ,  bjT]  7  is  its  corresponding  generalized  eigenvector. 


III.  Measure  transformed  canonical  correlation  analysis 

In  this  section  LCCA  is  generalized  by  applying  a  transform  to  the  joint  probability  measure  PXY. 
First,  a  transform  which  maps  PXY  into  a  set  of  joint  probability  measures  |QXYi  j  on  SXXy  is  derived 
that  have  the  property  that  they  preserve  statistical  independence  of  X  and  Y  under  PXY.  The  MTCCA 
method  is  obtained  by  applying  LCCA  to  X  and  Y  under  the  transformed  probability  measure  QXY 


A.  Transformation  of  the  joint  probability  measure  PXY 

Definition  1.  Given  tw’o  non-negative  functions  a  :  Mp  — >  M  and  v  :  M9  — >  M  satisfying 


0  <  E  [u  (X)  v  (Y) ;  .PXY]  <  oo, 

a  transform  on  the  joint  probability  measure  PXY  is  defined  via  the  following  relation 

Qx y'>  (A)  =  [PXY]  (-4)  =  J  Tu,v  (x,  y)  dPXY  (x,  y) , 

A 

where  A  E  SXXy,  x£f,  y  E  f,  and 

Tu,v  (x,  y)  = 


u  (x)  v  (y) 


E  [u  (X)  v  (Y) ;  PXY] ' 

The  functions  u  (•)  and  v  (•),  associated  with  the  transform  T,,  ,,  [■],  are  called  the  MT-functions. 


(7) 

(8) 

(9) 


In  the  following  Proposition,  some  properties  of  the  measure  transform  (8)  are  given. 

Proposition  1.  Let  Qx  Y;i  be  defined  by  relation  (8).  Then 
1)  is  a  probability  measure  on  SXXy ■ 
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2)  Qxy  ^  /‘s  absolutely  continuous  w.r.t.  Px Y,  with  Radon-Nikodym  derivative  [33]  given  by 


dQ XY  )  (x,  y) 


=  Tu,v  (x,  y) . 


(10) 


dPXY  (x,  y) 

3)  If  ~K.  and  Y  are  statistically  independent  under  PXy.  then  they  are  statistically  independent  under 

n(u,v) 

VXY  ■ 

4)  Assume  that  the  MT-functions  a  ( • )  and  v  (•)  nre  strictly  positive.  If  X  and  Y  nre  statistically 
independent  under  Qxy  >  then  they  are  statistically  independent  under  Px y- 

[A  proof  is  given  in  Appendix  A] 

By  modifying  the  MT-functions  u  { ■ )  and  v  ( • ) ,  such  that  the  conditions  in  Definition  1  are  satisfied, 
an  infinite  set  of  joint  probability  measures  on  SXXy  can  be  obtained. 


B.  The  MTCCA  procedure 

MTCCA  generates  a  sequence  of  pairwise  linear  combinations  (a^X,  b^Y),  k  =  1, . . . ,  r  =  min  ( p ,  q ) 
that  have  the  following  properties  under  the  transformed  probability  measure  Qxy  -  a/.  X  and  bfY 
have  unit  variance,  the  correlation  coefficient  between  a^X  and  b^Y  is  maximal,  and  (a^X.b^Y) 
are  uncorrelated  with  (a/  X.  b/Y)  for  all  1  <  l  <  k.  In  MTCCA,  the  pairs  (a/,.,  b/, )  and  (a^.X.  b[Y) 
are  called  the  k-th  order  MT-canonical  directions  and  the  fc-th  order  MT-canonical  variates,  respectively. 
The  correlation  coefficient  between  ak  X  and  h]  Y  under  Qx y  is  called  the  fc-th  order  MT-canonical 
correlation  coefficient. 


The  correlation  coefficient  between  arX  and  bTY  under  Qxy'  Is  given  by 


Corr 


aTX,bTY;Q^’) 


Cov 


aTX,  bTY;  Q 


(u,v) 


aTsiT}l 


Var 


aTX;  Q 


(u,v) 


Var 


bTY  ;Q 


( U,V ) 

Y 


aTS^i’!,)a\/  bTXv ’^l 


where  Corr 


V)V  X  Y 


(11) 

is  called  the  MT-correlation  coefficient,  and  the  measures  Qx'r)  and  (fiQ'V‘  are 


the  marginal  probability  measures  of  Qxy'1  on  5^  and  Sy,  respectively.  The  matrices  £ 


l(u,v) 


denote  the  covariance  matrix  of  X  under  Qx’V\  the  covariance  matrix  of  Y  under  Qf  '1,  and 


( U,v )  r,(u,v) 

X  !  ^Y 

(u,v) 


and 


their  cross-covariance  matrix  under  Qxy  \  respectively,  where  it  is  assumed  that  £x and  1  are 


non-singular. 


Using  (1)  and  (10)  it  can  be  shown  that  E  G(X,Y ) ;  Q&w)  =  E  [G  (X,  Y)  ipU:V  (X,  Y) ;  PXy 
where  G  (X,  Y)  is  some  arbitrary  matrix  function  of  X  and  Y.  Therefore,  one  can  easily  verify  that 

=  E  [XXT<^  (X,  Y) ;  Pxy]  -  E  [X<pu,v  (X,  Y) ;  PXY]  E  [X.Tpu,v  (X,  Y) ;  PXY]  , 


^ (uiv )  _ 


(12) 


E  [YY Tipu>v  (X,  Y) ;  PXY]  -  E  [Y<pUjV  (X,  Y) ;  PXY]  E  [YTcpu,v  (X,  Y) ;  PXY]  ,  (13) 


i(u,v)  _ 


and 


E&0  =  E  [X.YTipu>v  (X,  Y) ;  PXY]  -  E  [X<pV)V  (X,  Y) ;  PXY]  E  [YTipu,v  (X,  Y) ;  Px 


(14) 


Equations  (12)-(14)  imply  that  Xx  ’  ,  XY  J  and  SXY  are  weighted  covariance  and  cross-covariance 
matrices  of  X  and  Y  under  PXY,  with  weighting  function  ipu.v  (■,•). 

MTCC A  solves  the  following  constrained  maximization  sequentially  over  k  =  1 , ,r. 


(u.v)  ^(u.v)  ^(u,v)\ 

\  vv  v".  m  =  maxa  SXY  b, 

a.b 


Pk  l  5  ^  Y  ?  lxy 


(15) 


s.t.  a 


TY^’v)a  =  bTY[v’v)h  =  1, 


and  aTSXY;)b;  =  brS^Y)Ta;  =  =  bT'S<^’v)b /  =  0  V  1  <  l  <  k, 

where  pk  ^Sx ,v\  XY' ,v\  XXY  ^  denotes  the  fc-th  order  MT-canonical  correlation  coefficient.  Since  the 
number  of  constraints  in  (15)  increases  with  k,  the  MT-canonical  correlation  coefficients  satisfy  the 
following  order  relation  1  >  pi  ^SXU,1;\  SY  : ,v\  XXY  ^  >  . . .  >  pr  ^Sx  'v\  XY SXY  ^  >  0. 

Similarly  to  (5)  the  constrained  maximization  problem  in  (15)  reduces  to  the  following  generalized 
eigenvalue  problem 


0 

i(u,v)T 


v(“:0 

■^XY 


a 

b 


=  P 


0 

,(u,v) 


a 

b 

(16) 


where  p  =  pk  (Ex,  SY,  XXY)  is  the  /,;-th  largest  generalized  eigenvalue  of  the  pencil  in  (16),  and 
[a7',  b7  ] 7  =  [a^ ,  b7;]  7  is  its  coiTesponding  generalized  eigenvector. 

By  modifying  the  MT-functions  u  (•)  and  v  (•),  such  that  the  condition  in  (7)  is  satisfied,  the  MT- 
correlation  coefficient  under  QXY  7  is  modified,  resulting  in  a  family  of  canonical  correlation  analyses, 
generalizing  LCCA  described  in  Subsection  II-B.  In  particular-,  by  choosing  u  (x)  =  1  and  v  (y)  =  1, 


then  =  Pxy,  Corr 


aTX,  bTY;  Q(y 


=  Corr  [arX,  b7  Y ;  PXY] ,  and  the  LCCA  is  obtained. 


Other  choices  of  u  (•)  and  v  (•)  are  discussed  below. 


IV.  Selection  of  the  MT-functions 

In  this  section  we  parameterize  the  MT-functions  u  (x;  s)  and  v  (y;  t)  with  parameters  s  E  Rp  and 
f  £  M9  under  the  exponential  and  Gaussian  families  of  functions.  This  will  result  in  the  corresponding 
cross-covariance  matrix  XXY7  (t,  s)  gaining  sensitivity  to  non-linear  relationships  between  the  entries  of 
X  and  Y.  Optimal  choice  of  the  parameters  s  and  t  is  also  discussed. 
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A.  Exponential  MT-functions 

Let  ?/(•;•)  and  v  (■:■)  be  defined  as  the  parameterized  functions 

uE  (x;  s)  =  exp  (sTx)  and  vE  (y;  t)  =  exp  (tTy)  ,  (17) 


where  sgRp  and  t  E  M9.  Using  (9),  (14)  and  (17)  one  can  easily  verify  that  the  cross-covariance  matrix 
of  X  and  Y  under  Qxf)Vv^  takes  the  form 


d 2  logMxv  (M) 
dsdtT 


(18) 


where 

Mxy  (s,  t)  4  E  [exp  (sTX  +  tT Y)  ;  PXY]  (19) 


is  the  joint  moment  generating  function  of  X  and  Y,  and  it  is  assumed  that  MXY  (s,  t )  is  finite  in  some 
open  region  in  x  W1  containing  the  origin.  Note  that  the  cross-covariance  matrix  in  (18)  involves 
higher-order  statistics  of  X  and  Y.  Additionally,  observe  that  SX“Y,1,E^  (s,  t)  reduces  to  the  standard 
cross-covariance  matrix  SXY  for  s  =  0  and  t  =  0.  Finally,  note  that  the  quantity  in  (18)  has  been 
proposed  in  [22]-[28]  for  blind  source  separation,  blind  channel  estimation,  blind  channel  equalization, 
and  auto-regression  parameter  estimation.  To  the  best  of  our  knowledge  this  paper  is  the  first  to  propose 
this  quantity  for  generalizing  LCCA. 

In  the  following  Theorem,  which  follows  directly  from  (18)  and  the  properties  of  MXY  (s,  t )  [29],  [30], 
one  sees  that  SXY,’!''h;-1  (,s,  t)  preserves  statistical  independence  and  can  capture  non-linear  dependencies 
when  they  exist. 


Theorem  1.  Let  U  denote  an  arbitrary  open  region  in  Rp  x  W1  containing  the  origin,  and  assume  that 
Mx  Y  (s,  t )  is  finite  on  U.  The  random  vectors  X  and  Y  are  statistically  independent  under  the  joint 
probability  measure  PXY  if  and  only  if 

£xy,1Jb)  (a,  t)  =  0  V  (s,  t )  E  U.  (20) 

[A  proof  is  given  in  Appendix  B  /. 


The  “if”  is  the  interesting  part  of  the  theorem  since  the  “only  if”  part  follows  directly  from  Property  3 
of  Proposition  1.  In  particular,  if  X  and  Y  are  statistically  dependent  under  I\  Y ,  then  there  exist  a  E  Mp, 
b  E  M9,  selp  and  t  E  M9,  such  that  a7  XxaY  ,VE^  ( s ,  f)  b  ^  0.  Thus,  (11)  implies  that  if  X  and  Y  are 
statistically  dependent  under  PXY  then  there  exist  linear  combinations  of  the  form  a7  X  and  b7  Y  whose 
MT-correlation  coefficient  under  QxY’lKi  is  non-zero. 
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Finally,  we  show  that  MTCCA  with  the  exponential  MT-functions  in  (17)  is  translation-invariant.  Let 
X'  =  X+cc  and  Y'  =  Y +/3,  where  a.  and  (3  are  deterministic  vectors  in  Rp  and  R9,  respectively.  Accord¬ 
ing  to  (9)  and  (17)  pu,v  (X,  Y)  =  <pu,v  (X7,  Y').  Therefore,  by  (12)-(14):  s£‘E,t’B)  (a,  t)  =  S^e’"e)  (a,  £), 
SyE’^)  (a,t)  =  SyE,i;E^  (a,t),  and  Sxy’1e^  (s,  t)  =  (a,t).  Thus,  by  (15),  the  MT-canonical 

correlation  coefficients  are  invariant  to  translation,  i.e. 


Pk 


i(ue,»b) 


(s,£) 


S(?Xe,Ve) 

Y 


s,  *) ,  S^y,,;e)  (M))  =  /A  (" 


,(mb,^e) 


,t)  ,s<™> 


for  k  =  1, . . .  ,r. 


B.  Gaussian  MT-functions 

Next  we  define  the  MT-functions  «(•;•,•)  and  v  by 


uG  (x;  s,  a )  = 


1 


x  —  s\ 


exp 


and  vG  (y;  t , r)  = 


1 


(27TT2)  2 


exp 


|y-t| 

2t2 


,(21) 


(27Tfj2)2  y  2d2  ^ 

where  s  G  Rp,  t  G  R9,  a  G  R+,  t  G  R+,  and  ||-||2  denotes  the  ^-norm.  Since  uG  (•;  •,  •)  and  vG  (•;  •,  •) 
are  strictly  positive  and  bounded,  one  can  easily  verify  that  the  condition  in  (7)  is  satisfied.  Relations  (9) 
and  (14)  imply  that  the  MT-functions  (21)  produce  a  weighted  cross-covariance  matrix,  for  which  the 
observations  are  weighted  in  an  inverse  proportion  to  the  distances  ||x  —  s||9  and  ||y  — 1||2.  Hence,  the 
resulting  MT-correlation  coefficient  is  a  measure  of  local  linear  dependence  in  the  vicinity  of  (s,t).  We 
note  that  local  linear  dependence  exists  whenever  there  are  global  non-linear  dependencies. 

Sensitivity  of  Sxy’”g  (s,  t)  to  non-linear  relationships  between  X  and  Y  is  shown  via  the  following 
Theorem. 


Theorem  2.  Let  a,  r  be  fixed  and  positive.  Additiofially,  let  U  denote  an  arbitrary  open  region  in 
Rp  x  R9  containing  the  origin.  The  random  vectors  X  and  Y  are  statistically  independent  under  the 
joint  probability  measure  PXY  if  and  only  if 

£xy ’“G)  (a,  t)  =  0  V  (a,  t)  G  U.  (22) 

[A  proof  is  given  in  Appendix  Cj. 

Hence,  if  X  and  Y  are  statistically  dependent  under  PXY,  then  there  exist  a  G  Rp,  b  G  R9,  s  G  Rp 
and  t  G  R9,  such  that  arXXY  ( s ,  t)  b  f  0.  Therefore,  again,  non-linear  dependencies  can  be  detected 

using  MTCCA. 
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C.  Comparison  between  the  exponential  and  Gaussian  MT-functions: 

Unlike  MTCCA  with  Gaussian  MT-fucntions  (21),  MTCCA  with  exponential  MT-functions  (17)  is 
translation  invariant.  Moreover,  in  MTCCA  with  Gaussian  MT-functions,  in  addition  to  the  location 
parameters  s,  t,  which  share  the  same  dimensionality  of  the  scaling  parameters  of  the  exponential  MT- 
functions,  one  has  to  set  two  width  parameters  a  and  r.  On  the  other  hand,  unlike  the  exponential  MT- 
functions,  the  Gaussian  MT-functions  are  bounded  in  the  joint  observation  space  Tx)i.  Hence,  MTCCA 
with  Gaussian  MT-functions  is  more  robust  to  outliers.  Additionally,  the  Gaussian  MT-functions  has  the 
property  that  they  localize  linear  dependence  over  the  observation  space.  This  property  is  illustrated  in 
Subsection  VI- A.  Additional  common  properties  of  the  exponential  and  Gaussian  MT-functions  are  given 
in  the  following  remarks: 

Remark  1.  Since  the  exponential  and  Gaussian  MT-functions  are  strictly  positive,  by  Property  4  of 
Proposition  1  we  conclude  that  Q^y’^  an-d  (/xy  "  ''  preserve  statistical  dependence  under  Px  Y  ■ 

Remark  2.  The  exponential  and  Gaussian  MT-functions  preserve  Gaussianity  in  the  sense  that  if  X  and 
Y  are  jointly  Gaussian  under  Px Y)  then  they  are  jointly  Gaussian  under  Qxf’VK'  and  Qx-y  ''''' ■ 


D.  Selection  of  the  MT-fimctions  parameters 

A  natural  choice  of  the  parameters  s  and  t,  would  be  those  that  maximize  the  first-order  MT-canonical 
correlation  coefficient  pi  |Sx’^  (s,i) ,  (s,£)  ,  Xxy'^  (s,£)^  in  (15).  However,  this  maximization 

is  analytically  cumbersome.  Therefore,  as  an  alternative,  we  propose  maximizing  a  lower  bound  on 
pi  ^Y1<x'v'1  (s,  t) ,  Sy’”'  (s,  t) ,  Sxy'  (s,  t)) .  We  show  that  the  resultant  first-order  MT-canonical  corre¬ 
lation  coefficient  will  be  sensitive  to  dependence  between  X  and  Y. 


Proposition  2.  Define  the  following  element-by-element  average: 


i>  ( sSA  (»,*),  (*,  t) .  zs"  <«,  n )  a 


1  (u,v) 

JY 


(■ u,v ) 


(-.PI 

A  1  \  \  ^ 

S&0  (s,tj 

2.  ^ 

pq  4-^ 

Y  i=l  j=i 

V%’v)  ( a,t ) 

i,i 

(s,  t ) 

3,3  ' 

1/2 


(23) 


where  [A]^  •  denotes  the  i,j-th  entry  of  A. 


(s,t)  ,X 


C u,v ) 

Y 


(M)  ,S 


(u,v) 

XY 


<  Pi 


(s,£)  ,X 


(u,v) 

Y 


(s,£))  • 


(24) 


[A  proof  is  given  in  Appendix  D] 
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Proposition  2  suggests  choosing  the  optimal  MT-functions  parameters  by  maximizing  the  lower  bound 
in  (24): 

=  arg  max  i/j  ( s,t ) ,  E^’^  (s,i) ,  E^}  (s,t))  ,  (25) 

(s,t)ev  V  / 

where  1/  a  closed  region  in  Mp  x  M':/  containing  the  origin.  Under  the  MT-functions  pairs  in  (17)  and 
(21)  one  can  verify  that  i/>  (s,  t ) ,  (s,  i) ,  E^t^  (s,  i)^  is  continuous  in  Mp  x  M9.  Therefore, 

by  the  extreme  value  theorem  [31]  it  has  a  maximum  in  V.  The  maximization  problem  in  (25)  can  be 
solved  numerically,  e.g.,  using  gradient  ascent  [13]  or  greedy  search  over  the  region  V. 

The  following  theorem  justifies  the  use  of  the  first-order  MT-canonical  correlation  coefficient  as  a 
measure  of  statistical  independence. 


Theorem  3.  The  random  vectors  X  and  Y  are  statistically  independent  under  Px y  if  and  only  if 


Pi  (S^’w)  (a*,t*))  =  0, 

where  (u,v)  are  the  MT-functions  in  (17)  or  (21),  and  ( s*,t *)  are  selected  according  to  (25).  [A  proof 
is  given  in  Appendix  E] 


Therefore,  if  the  MT-functions  and  their  parameters  are  selected  as  in  Theorem  3,  we  conclude  that 
X  and  Y  arc  statistically  independent  under  I\Y  if  and  only  if  they  are  uncorrelated  under  Qx 
Hence,  since  by  Property  3  of  Proposition  1  Qxjf  preserves  statistical  independence  under  Px Y ,  we 
also  conclude  that  X  and  Y  are  statistically  independent  under  QXY  }  if  and  only  if  they  are  uncorrelated 
under  Qx y'K 


V.  Empirical  implementation  of  MTCCA 

Given  N  i.i.d.  samples  of  (X,Y)  an  empirical  version  of  MTCCA  (15)  can  be  implemented  by 
replacing  ~SX’V\  Ey'1*  and  EXy^  in  (15),  (16)  and  (25)  with  their  sample  covariance  estimates.  Hence, 
strongly  consistent  estimators  of  S 
(X,  Y). 


x'v\  EyJ'j  and  S^'y  j  are  constructed,  based  on  N  i.i.d.  samples  of 


Proposition  3.  Let  (Xn,  Yn),  n  =  1 .....  Ar  denote  a  sequence  of  i.i.d.  samples  from  the  joint  distribution 

PXY,  and  define  the  empirical  covariance  estimates 

N 


A_ 

x  ~  N  —  1 


£(«>«)  _A 

Y  ~  N  -  1 


^  ^  ^  A n  X n  fu,'o  (XniY„ 

n=  1 

1  N 

J2Yn^n<Pu,v  (Xn,Y, 


n=  1 


N  ~  (u,v)  ~  (u,v)T 

- /Lx  Tx 

JV  —  1  ^  1 

(26) 

N  ^  ( u,v )  ^  (u,v)T 

- mV  mV  , 

n-  r  Y  ^Y 

(27) 
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and 


N 


A 

^YV  — 


'XY 


(28) 


where 


(29) 


n=l  n— 1 


and 


(30) 


Assume 


E  [u4  (X) ;  Px]  <  oo,  E  [vA  (Y) ;  PY]  <  oo 


(31) 


Note  that  for  u  (X)  =  1  and  v  (Y)  =  1,  the  estimators  ' 


'Jv  j  reduce  to  the  standard 


unbiased  estimators  of  the  covariance  and  cross-covariance  matrices  Xx,  XY  and  XXY,  respectively. 

The  empirical  MTCCA  procedure  with  the  exponential  and  Gaussian  MT-functions  is  given  in  Appendix 
G.  In  the  first  stage  of  the  procedure,  the  parameters  of  the  MT-functions  are  selected  by  solving  a  single 
(j>  +  q) -dimensional  maximization  problem  (64)  using  gradient  ascent.  It  can  be  shown  that  each  iteration 
of  the  gradient  ascent  algorithm,  which  only  involves  the  empirical  measure  transformed  covariance 
and  cross-covariance  matrices,  has  asymptotic  computational  load  (ACL)  of  0((p  +  q)2N)  flops  per 
iteration.  In  the  second  stage,  the  empirical  MT-canonical  correlation  coefficients  and  directions  are 
obtained  simultaneously  by  solving  the  GEVD  problem  (65)  with  ACL  of  0((p  +  q)3)  flops.  Unlike 
the  empirical  MTCCA,  the  empirical  ICCA  [11]  involves  a  sequence  of  (p  +  q) -dimensional  numerical 
maximizations,  one  for  each  pair  of  canonical  directions,  using  an  iterative  Newton-Raphson  algorithm. 
It  can  be  shown  that  each  iteration  of  the  Newton-Rafson  algorithm,  which  involves  re-estimation  of 
the  mutual-information  in  a  non-parametric  manner  and  inversion  of  a  Hessian  matrix,  has  ACL  of 
0((p  +  q)N2  +  (p  +  q  +  2k)3)  flops,  where  k  denotes  a  canonical  directions  pair  index.  The  empirical 
KCCA  procedure  [14]- [16],  which  involves  computation  of  two  N  x  N  Gram  matrices  followed  by 
solving  a  GEVD  problem,  has  ACL  of  O ( (p  +  q)N2  +  N3)  flops.  Hence,  one  sees  that  unlike  the 
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empirical  ICCA  and  KCCA,  the  computational  complexity  of  the  empirical  MTCCA  is  linear  in  N, 
which  makes  it  favorable  in  large  samples  size  scenarios. 

VI.  Numerical  examples 

In  this  section,  we  illustrate  the  use  of  empirical  LCCA,  ICCA,  KCCA  and  MTCCA  for  graphical 
model  selection.  In  every  example  below  the  empirical  MTCCA  was  performed  with  the  exponen¬ 
tial  and  Gaussian  MT-functions  via  the  procedure  in  Appendix  G.  In  ICCA,  the  empirical  mutual- 
information,  I/.,  between  each  pair  of  canonical  variates  was  mapped  to  the  interval  [0, 1]  via  the  formula 
Pk  =  \J  1  —  exp(— 2_4)  which  produce  the  empirical  informational  canonical  correlation  coefficients.  The 
empirical  KCCA  was  performed  using  Gaussian  radial  basis  function  kernels.  Since  KCCA  masks  the 
original  coordinates  of  X  and  Y,  it  is  not  illustrated  for  the  graphical  model  selection  tasks  in  simulation 
examples  1  and  2,  which  involve  variable  selection.  In  simulation  examples  1  and  2,  the  canonical 
correlation  coefficients  and  canonical  directions  were  estimated  using  N  =  1000  i.i.d.  samples  of  X 
and  Y.  The  statistical  significance  of  the  empirical  canonical  correlation  coefficients  was  tested  using 
empirical  estimates  of  p- values  associated  with  rejecting  the  null-hypothesis  of  no  statistical  dependence 
between  X  and  Y  (see  Appendix  H). 

A.  Simulation  example  1:  Selection  of  graphical  model  with  non-linear  connections 

In  this  example,  we  consider  the  random  vectors  X  =  [X\,X2]r  and  Y  =  [lj ,  If1' ,  where 

Yi  =  cos  (X^  +  O.IW, 

and  X\,  X’2,  Y,  and  W  are  mutually  independent  standard  normal  random  variables.  For  this  example, 
the  pair  of  linear  combinations  of  the  form  (a7  X,  brY)  having  maximal  dependency  is  obtained  for  the 
vector  pair  (ai  =  [1, 0]T  ,  bi  =  [1, 0]r)  which  are  identical  to  the  true  first-order  MT-canonical  directions. 
In  this  example,  all  pairs  of  linear  combinations  of  the  form  a7  X  and  b7  Y  have  zero  linear  correlation 
even  though  they  are  not  statistically  independent.  The  dependencies  between  X  and  Y  are  depicted  by 
the  bipartite  graphical  model  in  Fig.  1. 

The  averaged  estimates  of  the  MT,  linear,  and  informational  canonical  correlation  coefficients  and 
their  corresponding  averaged  p-values,  based  on  1000  Monte-Carlo  simulations,  are  given  in  Table  I. 
The  sample  means  and  standard  deviations  of  the  absolute  dot  products  of  (ai/||ai||2,ai/||ai||2)  and 
(bt/||bi||2,  bi/||bi||2),  based  on  1000  Monte-Carlo  simulations,  are  given  in  Table  II.  The  absolute  dot 
products  should  be  equal  to  1  when  the  estimated  canonical  directions  a,  b  are  equal  to  ai  =  [1,0]T, 
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Fig.  1.  The  graphical  model  of  dependencies  in  simulation  example  1.  A  single  edge  exists  between  AT  and  Y]  due  to  the 
non-linear  relation  model  Yi  =  cos  (AT)  +  0.1WT  The  correlation  between  A'i  and  Yi  under  Pxy  is  equal  to  zero  even  though 
they  are  dependent. 


bi  =  [1,0]T,  respectively.  One  can  notice  that  in  contrast  to  LCCA,  the  MTCCA  and  ICCA  detect  the 
true  dependencies  between  X  and  Y,  depicted  by  the  bipartite  graphical  model  in  Figs.  1. 


TABLE  I 

Simulation  example  1:  The  averaged  estimates  of  the  MT,  linear,  and  informational  canonical 

CORRELATION  COEFFICIENTS  AND  THEIR  CORRESPONDING  AVERAGED  p- VALUES  (IN  PARENTHESES). 


Exponential  MT-functions 

Gaussian  MT-functions 

LCCA 

ICCA 

pl 

0.83  (0) 

0.88  (0) 

0.06  (0.37) 

0.85  (0) 

P2 

0.04  (0.38) 

0.03  (0.36) 

0.01  (0.45) 

0.23  (0.42) 

TABLE  II 

Simulation  example  1:  The  sample  means  and  standard  deviations  (in  parenthesis)  of  c(ai,  ai)  and 

C(b!,  br),  WHERE  C(U,  V)  ^  | galgy. 


Exponential  MT-functions 

Gaussian  MT-function 

LCCA 

ICCA 

c(ai,ai) 

0.99  (7  •  10~4) 

0.99  (3  ■  10"4) 

0.73  (0.27) 

0.99  (2  ■  10~5) 

c(bi ,  bi) 

0.99  (4  •  10~4) 

0.99  (1  ■  10"4) 

0.75  (0.22) 

0.99  (1  ■  10“5) 

Scatter  plots  of  the  empirical  first-order  MT,  linear,  and  informational  canonical  variates  (a[  X.  bf  Y) 
are  shown  in  Figs.  2(a)-2(d).  Observe  that  unlike  LCCA,  MTCCA  and  ICCA  recover  the  true  non-linear 
relation  between  X  and  Y,  which  has  a  raised  cosine  shape.  In  these  figures,  we  have  also  plotted  the 
ellipses  associate  with  the  empirical  covariance  matrices  of  [a(  X,  b(  Y  v  under  the  probability  measures 
,  Qxy’Vg\  and  Px y,  respectively.  Observing  Figs.  2(a)  and  2(b)  one  can  notice  that  the  local 
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linear  trend  is  better  captured  by  MTCCA  with  Gaussian  MT-functions  due  to  their  localization  property, 
discussed  in  Subsection  IV-B. 


Fig.  2.  Simulation  example  1:  Scatter  plots  of  the  empirical  first-order  canonical  variates  obtained  by:  (a)  MTCCA  with 
exponential  MT-functions,  (b)  MTCCA  with  Gaussian  MT-functions,  (c)  LCCA,  and  (d)  ICCA.  Note  that,  while  the  linear 
canonical  variates  are  uninformative  (circular  Gaussian  distributed),  the  MT  and  informational  canonical  variates  have  captured 
the  non-linear  structure  (raised  cosine  shape)  of  the  non-linear  model.  This  occurs  since  all  variables  in  example  1  have  zero 
correlation  but  some  variables  are  non-linearly  dependent.  The  ellipses  represent  the  associated  covariance  matrices  under  the 
probability  measures  Q^y ,l'E\  Q^^’VG\  and  Pxy,  respectively. 
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B.  Simulation  example  2:  Selection  of  graphical  model  with  linear  and  non-linear  connections 

In  this  example,  we  consider  a  more  complex  model.  Let  the  random  vectors  X  =  [X\,  X2,  X3,  X4,  X-,}1' 
and  Y  =  [Y, .  Y2,  Y>]r  satisfy 

Y\  =  A 1  -)-  O.5X2  T-  0. Ill  1 , 

I2  =  cos  (X3  +  0.75X4  +  0.5X5)  +  0.IW2, 

where  Xt,  i  =  1 .... ,  5,  Wt,  i  =  1,2,  and  Y3  are  mutually  independent  standard  normal  random  variables. 
In  this  example  there  exist  two  independent  pairs  of  linear  combinations  (a^.X.  b^  Y),  k  =  1,2,  with 
maximal  inter-dependencies.  These  maximally  dependent  canonical  variates  are  obtained  for  the  vector 
pairs  (a!  =  [1, 0.5,  0,  0, 0]T  ,  bi  =  [1,0, 0]T)  and  (a2  =  [0,  0, 1,  0.75,  0.5]T  ,  b2  =  [0,1,  0]T),  which  are 
also  the  first-order  and  second-order  MT-canonical  directions.  The  dependencies  between  X  and  Y  are 
depicted  by  the  bipartite  graphical  model  in  Fig.  3. 


r2 

r3 


Fig.  3.  The  dependency  graphical  model  corresponding  to  simulation  example  2.  There  are  two  connected  components 

{(X1,Y1),(X2,Y1)j  and  {(X3,Y2),  (X4,Y2),  (X5,Y2)}. 

The  averaged  estimates  of  the  MT,  linear,  and  informational  canonical  correlation  coefficients  and  their 
corresponding  averaged  p-values,  based  on  1000  Monte-Carlo  simulations,  are  given  in  Table  III.  The 
sample  means  and  standard  deviations  of  the  absolute  dot  products  of  the  pairs  (a^ / 1 1 a^.  1 1 2 ,  / 1 1  1 1 0 )  and 
(bfc/||bfc||2,  bfe/||bfc||2),  k  =  1,2,  based  on  1000  Monte-Carlo  simulations,  are  given  in  Table  IV.  Observe 
that  both  MTCCA  and  ICCA  detect  the  true  dependencies  between  X  and  Y,  depicted  by  the  bipartite 
graphical  model  in  Fig.  3.  As  expected,  the  LCCA  detects  only  the  linearly  dependent  combinations. 


18 


TABLE  III 

Simulation  example  2:  The  averaged  estimates  of  the  MT,  linear,  and  informational  canonical 

CORRELATION  COEFFICIENTS  AND  THEIR  CORRESPONDING  AVERAGED  p- VALUES  (IN  PARENTHESIS). 


Exponential  MT-functions 

Gaussian  MT-functions 

LCCA 

ICCA 

h 

1  (0) 

1  (0) 

1  (0) 

0.93  (0) 

p2 

0.75  (0) 

0.9  (0) 

0.08  (0.22) 

0.89  (0) 

P3 

0.08  (0.2) 

0.1  (0.18) 

0.04  (0.35) 

0.24  (0.27) 

TABLE  IV 

SIMULATION  EXAMPLE  2:  THE  SAMPLE  MEANS  AND  STANDARD  DEVIATIONS  (IN  PARENTHESIS)  OF  c(afc,  a k)  AND 

c(b*,bfe),fc  =  1,2,  WHERE  c(u,v)  =  |. 


Exponential  MT-functions 

Gaussian  MT-functions 

LCCA 

ICCA 

c(ai,ai) 

1  (5  ■  10“5) 

1 

O 

T— 1 

IO 

1  (10~5) 

0.99  (7  ■  10"4) 

c(a2,  a2) 

0.99  (6  •  10“3) 

0.99  (8  •  10"3) 

0.5  (0.28) 

0.99  (1  •  10"3) 

c(bi,bi) 

1  (8  ■  10“5) 

1 

O 

t-H 

05 

1  (2  •  10-5) 

0.99  (2  ■  10"3) 

c(b2,b2) 

0.99  (2  •  10“3) 

0.99  (6  ■  10“3) 

0.7  (0.26) 

0.99  (3  •  10"3) 

C.  Measuring  long-term  associations  between  NASDAQ/NYSE  traded  companies 

Here,  MTCCA  is  applied  to  a  real  world  example  of  capturing  long-term  associations  between  pairs 
of  companies  traded  on  the  NASDAQ  and  NYSE  stock  markets.  The  compared  companies  were  Mi¬ 
crosoft  (MSFT),  Intel  (INTC),  Apple  (AAPL),  Merck  (MRK),  Pfizer  (PFE),  Johnson  and  Johnson  (JNJ), 
American  express  (AXP),  JP  Morgan  (JPM),  and  Bank  of  America  (BAC).  For  each  pair  of  companies, 
we  considered  the  random  vectors  X  =  [Xi,X2]t  and  Y  =  [Y\ .  Y]7.  The  variables  X\  and  Y\  are 
the  log-ratios  of  two  consecutive  daily  closing  prices  of  a  stock,  called  log-returns.  The  variables  X2 
and  I2  are  the  log-ratios  of  two  consecutive  daily  trading  volumes  of  a  stock,  called  log-volume  ratios. 
Consecutive  daily  measurements  of  X  and  Y  from  January  2,  2001  to  December  31,  2010,  comprising 
2514  samples,  were  obtained  from  the  WRDS  database  [35]. 

Figs.  4(a)  and  4(b)  display  the  matrix  of  empirical  first-order  MT-canonical  correlation  coefficients  for 
the  exponential  and  Gaussian  MT-functions,  respectively.  Figs.  4(c)-4(e)  show  the  matrix  of  empirical 
first-order  canonical  correlation  coefficients  obtained  by  FCCA,  ICCA  and  KCCA,  respectively.  Note  that 
MTCCA  and  KCCA  better  cluster  companies  in  similar  sectors:  (MSFT,  INTC,  AAPF)  -  technology, 
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(MRK,  PFE,  JNJ)  -  pharmaceuticals,  (AXP,  JPM,  BAC)  -  financial.  In  this  example,  the  p- values  associated 
with  all  empirical  first-order  canonical  correlation  coefficients  were  less  than  0.01. 

The  empirical  first-order  canonical  correlation  coefficients  were  used  for  constructing  graphical  models 
in  which  the  nodes  represent  the  compared  companies.  The  criterion  for  connecting  a  pair  of  nodes  was 
set  to  empirical  first-order  canonical  correlation  coefficient  greater  than  a  threshold  A.  In  Figs.  5-7  the 
graphical  models  selected  by  MTCCA  with  exponential  MT-functions  are  compared  to  LCCA,  ICCA  and 
KCCA,  respectively.  Similarly,  in  Figs.  8-10  the  graphical  models  selected  by  MTCCA  with  Gaussian 
MT-functions  are  compared  to  FCCA,  ICCA  and  KCCA,  respectively.  In  the  first  column  of  each  figure 
we  show  the  graphs  selected  by  MTCCA  for  A  =  0.5,0.55,0.58.  In  the  second  column  we  show  the 
corresponding  graphs  selected  by  the  other  compared  method  by  scanning  A  over  the  interval  [0, 1]  and 
finding  the  graph  with  minimum  edit  distance  [36].  The  symmetric  difference  graphs  are  shown  in  the  third 
column.  The  red  lines  in  the  symmetric  difference  graphs  indicate  edges  found  by  MTCCA  and  not  by  the 
other  compared  method,  and  vice-versa  for  the  black  lines.  Note  that  for  all  of  the  threshold  parameters 
A  investigated,  the  MTTCA  graph  shows  equal  or  larger  number  of  dependencies  than  the  closest  FCCA, 
ICCA  and  KCCA  graphs.  This  result  suggests  that  MTCCA  has  captured  more  dependencies  than  FCCA, 
ICCA  and  KCCA.  While  there  is  no  ground  truth  validation,  the  fact  that  MTCCA  clusters  together 
companies  in  similar  sectors  (Banking,  pharmaceuticals,  and  technology)  provides  anecdotal  support  for 
the  power  and  applicability  of  MTCCA. 

Fig.  1 1  depicts  the  distribution  of  the  empirical  MT,  linear,  and  informational  first-order  canonical 
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directions.  Fet  ai  =  [01,1,01,2]  and  bi  =  61,1,61,2 


on  the  unit  circle.  Observe  that  in  MTCCA  (first 


and  second  columns)  01,2  and  61,2  are  relatively  small  in  comparison  to  ai,i  and  61,1.  One  can  conclude 
that,  unlike  FCCA  and  ICCA,  MTCCA  is  zeroing  in  on  the  strong  non-linear  dependencies  between  the 
daily  log-returns  of  these  companies  and  is  de-emphasizing  the  daily  log-volume  ratios.  This  analysis  is 
not  performed  for  KCCA  since  the  empirical  canonical  directions  obtained  by  KCCA  do  not  correspond 
to  the  original  coordinates  of  X  and  Y. 

We  note  that  in  this  example  the  difference  between  MTCCA  and  ICCA  may  possibly  arise  from  the 
sensitivity  of  fixed  kernel  density  estimation,  preformed  in  ICCA,  to  the  heavy-tailed  financial  data  [37], 


VII.  Conclusion 

In  this  paper,  FCCA  was  generalized  by  applying  a  structured  transform  to  the  joint  probability  distri¬ 
bution  of  X  and  Y.  By  modifying  the  functions  associated  with  the  transform,  this  generalization,  called 
MTCCA,  preserves  independence  and  captures  non-linear  dependencies.  Two  classes  of  MTCCA  were 


20 


MSFT  INTC  AAPL  MRK  PFE  JNJ  AXP  JPM  BAC 


(c) 


(d) 


BAC 

JPM 

AXP 

JNJ 

PFE 

MRK 

AAPL 

INTC 

MSFT 


0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

0.3 


MSFT  INTC  AAPL  MRK  PFE  JNJ  AXP  JPM  BAC 


(e) 

Fig.  4.  NASDAQ/NYSE  experiment.  Empirical  first-order  canonical  correlation  coefficients  obtained  by  (a)  MTCCA  with 
exponential  MT  functions,  (b)  MTCCA  with  Gaussian  MT-functions.  (c)  LCCA,  (d)  ICCA,  and  (e)  KCCA.  Note  the  three 
blocks  of  mutually  high  canonical  correlations  revealed  by  MTCCA  and  KCCA:  MTCCA  and  KCCA  better  cluster  companies 
in  similar  sectors:  (MSFT,  INTC.  AAPL)  -  technology,  (MRK,  PFE,  JNJ)  -  pharmaceuticals,  (AXP,  JPM,  BAC)  -  financial. 
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Fig.  5.  NASDAQ/NYSE  experiment.  Left  column:  The  graphical  models  selected  by  MTCCA  with  exponential  MT-functions 
for  A  =  0.5,0.55,0.58.  Middle  column:  The  closest  graphs  selected  by  LCCA.  Right  column:  The  symmetric  difference 
graphs:  the  red  lines  indicate  edges  found  by  MTCCA  and  not  by  LCCA,  and  vice-versa  for  the  black  lines.  For  these  values 
of  A,  exponential  MTCCA  detects  more  dependencies  than  LCCA:  the  MTCCA  graph  has  more  edges  than  the  closest  LCCA 
graph. 


proposed  based  on  specification  of  MT-functions  in  the  exponential  and  Gaussian  families,  respectively. 
The  proposed  MTCCA  approach  was  compared  to  LCCA,  ICCA  and  KCCA  for  graphical  model  selection 
in  simulated  data  having  non-linear  dependencies,  and  for  measuring  long-term  associations  between  pairs 
of  companies  traded  on  the  NASDAQ  and  NYSE  stock  markets.  It  is  likely  that  there  exist  other  classes 
of  MT-functions  that  have  a  similar  capability  to  accurately  detect  non-linear  dependencies. 

In  the  paper  we  have  shown  that  the  Hessian  of  the  joint  cumulant  generating  function  (18)  is  a  special 
case  of  measure  transformed  covariance  matrix  with  exponential  MT-functions.  Therefore,  in  similar  to 
the  generalization  proposed  in  this  paper,  the  techniques  in  [2  ]-[28],  which  are  based  on  Hessians  of 
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Fig.  6.  NASDAQ/NYSE  experiment.  Left  column:  The  graphical  models  selected  by  MTCCA  with  exponential  MT-functions 
for  A  =  0.5,0.55,0.58.  Middle  column:  The  closest  graphs  selected  by  ICCA.  Right  column:  The  symmetric  difference 
graphs:  the  red  lines  indicate  edges  found  by  MTCCA  and  not  by  ICCA,  and  vice-versa  for  the  black  lines.  For  these  values 
of  A,  exponential  MTCCA  detects  more  dependencies  than  ICCA:  the  MTCCA  graph  has  more  edges  than  the  closest  ICCA 
graph. 


the  cumulant  generating  function,  may  also  be  generalized  by  the  measure-transformation  framework. 
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Appendix 

A.  Proof  of  Proposition  1: 

1)  Property  1: 

Since  pu..v  (x,  y)  is  nonnegative,  then  by  Corollary  2.3.6  in  [32]  Qxy1  is  a  measure  on  SXXy- 
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Fig.  7.  NASDAQ/NYSE  experiment.  Left  column:  The  graphical  models  selected  by  MTCCA  with  exponential  MT-functions 
for  A  =  0.5,0.55,0.58.  Middle  column:  The  closest  graphs  selected  by  KCCA.  Right  column:  The  symmetric  difference 
graphs:  the  red  lines  indicate  edges  found  by  MTCCA  and  not  by  KCCA,  and  vice-versa  for  the  black  lines.  For  A  =  0.58  the 
MTCCA  graph  has  one  more  edge  than  the  closest  KCCA  graph. 


Furthermore,  QXy'>  (-F  x  F)  =  1  so  that  (/xy1  is  a  probability  measure  on  SXXy- 

2)  Property  2: 

Follows  from  definitions  4.1.1  and  4.1.3  in  [32], 

3)  Property  3: 

Let  (fx  '  ’  and  Q'y'r}  denote  the  marginal  probability  measures  of  QXy'\  defined  on  Sx  and 
Sy,  respectively.  Additionally,  let  Ax  and  Ay  denote  arbitrary  sets  in  the  cr-algebras  Sx  and  Sy, 
respectively.  Using  (8)  and  (9),  the  assumed  statistical  independence  of  X  and  Y  under  Px Y,  and 
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Fig.  8.  NASDAQ/NYSE  experiment.  Left  column:  The  graphical  models  selected  by  MTCCA  with  Gaussian  MT-functions  for 
A  =  0.5,  0.55,  0.58.  Middle  column:  The  closest  graphs  selected  by  LCCA.  Right  column:  The  symmetric  difference  graphs: 
the  red  lines  indicate  edges  found  by  MTCCA  and  not  by  LCCA,  and  vice-versa  for  the  black  lines.  For  these  values  of  A, 
Gaussian  MTCCA  detects  more  dependencies  than  LCCA:  the  MTCCA  graph  has  more  edges  than  the  closest  LCCA  graph. 


Tonelli’s  Theorem  [33]: 


Qx,”)  (4c)  =  J  dQ^-y  '1  (x, y)  =  J 


u(x)v(  y) 


Axxy 


A^xy 


E  [tt  (X)  v  (Y) ;  Pxy] 


dPx Y  (x,  y) 


«(x) 


E[«(X);Px] 


<iPx  (x) 


w(y) 


E[w  (Y );PY] 


dPy  (y)  = 


“(x) 


E  [u  (X) ;  Px] 


(33) 


dP x  (x) 


Similarly,  it  can  be  shown  that  Q^’v^  (Ay)  =  J  k[c(  yhv] 


»(y) 


and 


Av 


<2xv  '  (A,  x  Ay)  -  j  E  |u  (X)'  pj  dP*  (x) 


E|;^.Px|^  (y)  =  (AJ  Qf' ”>  (A,) . 


!// 

(34) 
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Fig.  9.  NASDAQ/NYSE  experiment.  Left  column:  The  graphical  models  selected  by  MTCCA  with  Gaussian  MT-functions  for 
A  =  0.5, 0.55,  0.58.  Middle  column:  The  closest  graphs  selected  by  ICCA.  Right  column:  The  symmetric  difference  graphs: 
the  red  lines  indicate  edges  found  by  MTCCA  and  not  by  ICCA,  and  vice-versa  for  the  black  lines.  For  these  values  of  A, 
Gaussian  MTCCA  detects  more  dependencies  than  ICCA:  the  MTCCA  graph  has  more  edges  than  the  closest  ICCA  graph. 


Therefore,  since  Ax  and  Ay  are  arbitrary,  X  and  Y  are  statistically  independent  under  the  transformed 
probability  measure  c/xy  ' ■ 

4)  Property  4: 

According  to  the  definition  of  ipu>v  (x,  y)  in  (9),  the  strict  positivity  of  u  (x)  and  v  (y),  and  Property 
2,  we  have  that  Qxy  is  absolutely  continuous  w.r.t.  PXY  with  strictly  positive  Radon-Nikodym 
derivative  ddp^-(xy)  =  ^u-v  ^x-  y)-  Therefore,  by  Proposition  4.1.2  in  [  2]  it  is  implied  that  Px y 
is  absolutely  continuous  w.r.t.  Qxy^  with  a  strictly  positive  Radon-Nikodym  derivative  given  by 

cPPxy  (x,  y)  _  1 

dQx y'1  (x,  y )  Vu,v  (x,  y ) 


(35) 
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Fig.  10.  NASDAQ/NYSE  experiment.  Left  column:  The  graphical  models  selected  by  MTCCA  with  Gaussian  MT-functions 
for  A  =  0.5,0.55,0.58.  Middle  column:  The  closest  graphs  selected  by  KCCA.  Right  column:  The  symmetric  difference 
graphs:  the  red  lines  indicate  edges  found  by  MTCCA  and  not  by  KCCA,  and  vice-versa  for  the  black  lines.  For  A  =  0.55,  0.58 
the  MTCCA  graph  has  more  edges  than  the  closest  KCCA  graph. 


Hence,  let  Ax  and  Ay  denote  arbitrary  sets  in  the  cr-algebras  Sx  and  Sy,  respectively.  Using  (9), 
(35),  the  assumed  statistical  independence  of  X  and  Y  under  Q^-y  \  and  Tonelli’s  Theorem  [33]: 

1  ,  (36) 


-Fxy  (Ax  x  Ay)  —  J 


; dQ xy}  (x, y) 


<Pu,v  (x,  y) 

=  E  [u  (X)  v  (Y) ;  PXy]  [  — 7— rdQx’v)  (x)  [  ~^dQ^v)  (y) 

J  U(x)  J  v(y) 


Similarly,  it  can  be  shown  that 

px  (Ax)  =  Pxy  (Ax  X  y)  =  E  [u  (X)  V  (Y) ;  PXY]  E 


v(Y) 


«(x) 


dQ 


( IL,V ) 


X 


(37) 
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Fig.  1 1 .  NASDAQ/NYSE  experiment.  Distribution  of  the  empirical  MT,  linear,  and  informational  first-order  canonical  directions 
on  the  unit  circle.  Left  to  right  ordering:  First  column  -  MTCCA  with  exponential  MT-functions.  Second  column  -  MTCCA 
with  Gaussian  MT-functions.  Third  column  -  LCCA.  Fourth  column  -  ICCA.  The  estimated  MT-canonical  directions  in  first 
and  second  columns  are  much  more  concentrated  than  the  linear  and  informational  canonical  directions  in  third  and  fourth 
columns,  respectively.  In  particular,  while  linear  and  informational  canonical  directions  appear  to  be  equally  sensitive  to  the 
daily  log-returns  and  the  daily  log-volume  ratios,  MT-canonical  directions  are  much  more  sensitive  to  the  former  as  contrasted 
to  the  latter. 


and 


PY  (Ay)  =  PXY  (X  x  Ay)  =  E  [u  (X)  V  (Y) ;  PXY]  E 


u 


krQ^}lAy)dQ(")(y)-  m> 


Now,  using  (1),  (9),  and  (10)  we  have  that 


E 


u(X) 
and  similarly, 


=  E 


J _ .  n^u'1  ’) 

5  VXY 


«(X) 


=  E 


*■ Pu,v  (X,  Y) 


;  -Px  Y 


MY) 


— -,Q&V) 


«(X) 

E  [n(X);Px] 

E  [u  (X)  v  (Y) ;  PXY] 


E[n  (Y);  PY] 

E  [u  (X)  v  (Y) ;  PXY] 


(39) 


E 


(40) 
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Additionally,  by  setting  Ax  =  X  and  Ay  =  y  in  (36),  followed  by  using  (1),  (39),  and  (40)  it  is 
implied  that 

E  [u  (X)  v  (Y) ;  PXY]  =  E  [u  (X) ;  Px]  E  [v  (Y) ;  PY] .  (41) 


Finally,  substitution  of  (41)  into  (36),  (40)  into  (37),  and  (39)  into  (38)  yields 


Ex y  (Ax  x  Ay)  — 


f  E  [u  (X) ;  Px]  u,v ) 


u(x) 


dQ\ 


X 


dQM  (y)  =  px  (A*)  PY  (Ay)  , 


«(y) 


(42) 


and  therefore,  since  Ax  and  Ay  are  arbitrary,  X  and  Y  are  statistically  independent  under  PXY.  □ 


B.  Proof  of  Theorem  1 : 

Using  (18)  and  (19)  one  can  verify  that  if  the  condition  in  (20)  is  satisfied,  then 

MXY(M)  =  Mx(s)MY(t)  V  (a,  t)  e  U,  (43) 

where  Mx  (•)  and  MY  (•)  are  the  marginal  moment  generating  functions  of  X  and  Y,  respectively.  The 
joint  moment  generating  function  reduced  to  any  open  region  containing  the  origin,  within  its  region  of 
convergence,  uniquely  determines  the  joint  distribution  [29],  [30]  (this  property  stems  from  the  analyticity 
of  the  joint  moment  generating  function  about  the  origin).  Hence,  by  the  relation  above  we  have  that 
X  and  Y  are  statistically  independent.  Conversely,  if  X  and  Y  are  statistically  independent  under  PXY, 
then  by  Property  3  of  Proposition  1  we  have  that  SXY’1'H9  (s,t)  =  0  for  all  (s,t)  €  U.  □ 


Using  (9),  (14),  and  (21)  one  can  easily 


E 


Xg  (X)  h  (Y)  exp  ;  PXY 


E 

XY Tg  (X)  h  (Y)  exp 

(sTx  ,  tT 

*) 

;PXY 

E 

g  (X)  h  (Y)  exp 

A 

rX  ,  tT  y\ 

r!  +  t!  J 

;  Exy 

E 


E2 


Y Tg  (X)  h  (Y)  exp  I 


^  + 


fYVp 

r2  )  5  r XY 


p(X)/l(Y)exp(^  +  ^);PXY 


where 


Additionally,  define 


g  (X)  =  exp  - 


M^h)  (s,  t)  =  E  exp  (sTX  +  tTY)  ;  Q 


IXII 


2o'2 


and  h  (Y)  =  exp  — - 


2r2 


(ffA) 

XY 


(44) 


(45) 


(46) 
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as  the  joint  moment  generating  function  of  X  and  Y  under  the  transformed  probability  measure  QXY 
associated  with  the  MT-functions  g  (X)  and  h  (Y)  in  (45).  Using  (1)  and  (10)  it  can  be  shown  that 


M&h)  (s,  t)  =  E  [exp  (sTX  +  tT Y)  <p9jh  (X,  Y) ;  PXY] 


(47) 


where  (X,  Y)  is  defined  in  (9).  Therefore,  by  (44)  and  (47)  we  have  that 


i(“g,^g) 


S,  t)  =  CT2T2 


d2  log  MxyA)  (a~2s,T-H) 


dsdtT 


(48) 


Hence,  if  the  condition  in  (22)  is  satisfied,  then  by  the  properties  of  the  joint  moment  generating  function 
[29],  [30],  it  is  implied  that  X  and  Y  are  statistically  independent  under  QXY  .  Thus,  since  the  MT- 
functions  g  (X)  and  h  (Y)  are  strictly  positive,  then  by  Property  4  of  Proposition  1  we  conclude  that 
X  and  Y  are  statistically  independent  under  Px Y.  Conversely,  if  X  and  Y  are  statistically  independent 
under  Px y>  then  by  Property  3  of  Proposition  1  we  have  that  SXY  ^  (s,t)  =  0  for  all  (a,  t)  e  U.  □ 


D.  Proof  of  Proposition  2: 


Let  e-  denote  a  p-dimensional  column  vector,  where 
delta  function.  It  is  easily  verified  that 


e. 


Cp) 


J  k 


=  §i}k,  and  ()(_  _  ■,  denotes  the  Kronecker 


p  q 

EE 


el‘ '  xXY 


i=  1  j=l 

Hence,  by  (49) 


Ap)T^(u,v) 


-  — i  er'-  (a,  t)  (a,  t)  e 


(?) 

j 


<  p  -  q  •  max 


Jvh’1) 

a  ^XY 


(8,t)  b 


a^o.b^o  a teM  (a>  t)  (a,  t)  b 


(u,v) 


V  Q 

EE 


e(p)TsKU 


(s,t)e 


(<?) 


J  J  i=l  j=i  o,;  -^x 


a,t)e<f)e^T^’v)  (M)  e(<?) 


/ 


< 


max 


aTvM 
a  zjxy 


(s,t)b 


j  * 
2 


\  1/2 

2  J 


1/2 


a^o.b^o  arSM  (Sj  t)  abTXVu’,,J  (a,  t)  b 


(u,v) 


TyM 


max 

a7^0,b^0 


s,t)b 


arXx  1 (s,  t)  av/bTXY (s,  t)  b 


=  max  a  5Jxy  (s,t)b  s.t.  a  a  =  b  SY  b  =  l, 

a,b 


(s,t)  b 


(49) 


(50) 


where  the  last  equality  stems  from  the  invariance  of  ,  = —  , - 

v/a^X^”')(s,t)av/brS^1’'>(sT)b 
a  and  b.  Therefore,  according  to  (15),  (23)  and  (50),  the  relation  in  (24)  is  verified. 


to  normalization  of 

□ 
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E.  Proof  of  Theorem  3: 

If  pi  (Sx  ',!)  (. s*,t *) ,  ( s*,t *) ,  Sxy^  (s*,t*)^  =  0,  then  by  (24)  and  the  positivity  of  f  (■,  ■ 

(^v)  (s*,t*),^’v)  (s*,t*),2&v)  (s*,t*))  =  0. 


Therefore,  since  by  (25)  (s*,t*)  are  the  maximizers  of  ^  (Ex’  (s,  t) ,  Sy'  ;  (s,  £)  5  XXY  J  (s,  1) )  over 
V,  which  is  a  closed  region  in  LRT  x  R9  containing  the  origin,  we  have  that 

4  (v£'v)  (s,t)  ,V<?'vHs,t)  ,V&\s,tj)  =  0  V(s,t)€V. 


Hence,  by  the  definition  (23)  of  f  (-,  •,  •),  Sxy^  (s,  t)  =  0  on  the  interior  of  V,  which  is  an  open  region 
in  R p  x  R9  containing  the  origin.  Thus,  since  the  MT-functions  u  ( • )  and  v  (•)  are  chosen  according  to 
(17)  or  (21),  by  Theorems  1  and  2  X  and  Y  must  be  statistically  independent  under  I‘x Y ■ 

Conversely,  if  X  and  Y  are  statistically  independent  under  Px Y,  then  by  Property  3  of  Proposition 
1  we  have  that  Sxy'  (s,£)  =  0  for  all  (s. t)  G  V,  and  in  particular  for  ( s*,t *).  Therefore,  by  (15), 


pi  (  (, s*,t *) ,  T,^’V)  (, s*,t *) ,  («*,**))  =  0. 


■>(u,v) 


(u,v) 


□ 


F.  Proof  of  Proposition  3: 

It  suffices  to  show  that  if  the  conditions  in  (31)  and  (32)  are  satisfied,  then  SXY  H>  XXY  almost 
surely  as  iV  ->  oo.  Convergence  proofs  for  Xx  and  XY  are  very  similar  and  therefore  omitted. 
According  to  (28)-(30) 


lirn  SXY 

N^-OO 
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N—¥  oo  N 
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1  iV 

i“Y  ]  =  _Um  -  (X 
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■n,  Y„)  -  Jim  fi(xu)  lim  p{f’v)T, 
TV— >•  OO  N—>oo 
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\,v  (Xn;  Y„ 


N 
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N 

Jim  w  E  «(Xn)u(Yn) 

N^oo  n=i 


(51) 


(52) 
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lim  £  E  Xnu  (Xn)  v{Yn 

i.  ^  (n  u)  N^-oo  n=l 

Jim  /r,x  =  - - - 

1  E«(XnMY„) 

71=1 


N-hx)1 


(53) 


lim  n 


N 

7T  E  Y„w(Xn)n(Y„ 

1  n=  1 


lim  F  ^ 

-  (w,^)  tV— >oo  n=l 

lim  rtV  =  - - - 

IV— loo  .  tV 

Jim  ^  E  «(Xn)-u(Yn) 
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(54) 
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and  it  is  assumed  that  the  denominator 

1  N 

lim  -T7  E  u  (Xn)  v  (Yn)  /  0  a.s.  (55) 

N— >oo  iV  z ' 

n=  1 

In  the  following,  the  limits  of  the  series  in  the  r.h.s.  of  (52)-(54)  are  obtained.  Additionally,  in  Remark 
3  below,  we  show  that  the  assumption  in  (55)  is  satisfied.  Since  (Xn,  Yn),  n  =  1, . . . ,  N  is  a  sequence 
of  i.i.d.  samples  of  (X,  Y),  then  the  random  matrices  XnY (Xn)  v  (Y„),  n  =  1, . . . ,  N,  in  the  r.h.s. 
of  (52),  define  a  sequence  of  i.i.d.  samples  of  XYTu  (X)  v  (Y).  Moreover,  if  E  [Xj^;  Px]  <  oo,  for  any 
k  =  1 ,p,  E  [Y;4;  Py]  <  oo,  for  any  (  =  1, . . . ,  q,  E  [«4  (X) ;  Px]  <  oo,  and  E  [ v 4  (Y) ;  PY]  <  oo, 
then 


E  [\XkYiu  (X)  v  (Y)|  ;  p 


XYJ  < 


E 


(Xky ))2;PX 
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MXMY))2;Px 


(56) 


<  (E  [X4;  Px]  E  [l)4;  PY]  E  [u4  (X) ;  Px]  E  [n4  ( Y) ;  PY] ) 1  <  oo, 


for  any  k  =  1 , . . . ,  p  and  any  l  =  I .....  <y,  where  the  second  and  third  semi-inequalities  stem  from 
the  Holder  inequality  for  random  variables  [32].  Therefore,  by  Khinchine’s  strong  law  of  large  numbers 


(KSLLN)  [33] 
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N 


lim  T7  E  X«Yn'“  (X«)  u  (Yn)  =  E  [XYt«  (X)  v  (Y) ;  Px 

Af->CXD  iV  z ' 


a.s. 


(57) 


n=l 


Similarly,  it  can  be  shown  that  if  the  conditions  in  (31)  and  (32)  are  satisfied,  then  by  the  KSLLN 


N — ^oo  N 


N^-oo  N 


N 

E: 

n= 1 
N 

E' 

n=l 


and 


lim  — 

W— »oc  X 


JV 


E  [Xu  (X)  v  (Y) ;  PXY]  a.s. 

(58) 

E  [Yu  (X)  v  (Y) ;  PXY]  a.s. 

(59) 

E  [u  (X)  v  (Y) ;  PXY]  a.s. 

(60) 

n.=  l 


Remark  3.  5y  (60)  and  the  assumption  in  (7)  the  denominator  in  the  r.h.s.  of  ( 52)-(54 )  is  non-zero 
almost  surely. 


Therefore,  since  the  sequences  in  the  l.h.s.  of  (52)-(54)  are  obtained  by  continuous  mappings  of  the 
elements  of  the  sequences  in  their  r.h.s.,  then  by  (57)-(60),  and  the  Mann-Wald  Theorem  [34] 


lim  — 

N—too  N 
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lim  /*£*’ 'v)  =  E  [X^(^)  v00  ;  Pxy]  =  E  (X  Y) ;  PXY] 

oo^x  E  [u  (X)  u  (Y) ;  PXY]  L  XYJ 


and 


lim  (i Y = 

TV— »oo 


E  [YTn  (X)  v  (Y) ;  Px 


=  E  [Y ipu  v  (X,Y );PX 


a.s. 


a.s., 


(62) 


(63) 


E[u(X)t;(Y);PXY] 

where  the  last  equalities  in  (61)-(63)  follow  from  the  definition  of  pu,v  (X,  Y)  in  (9). 

Thus,  since  the  sequence  in  the  l.h.s.  of  (51)  is  obtained  by  continuous  mappings  of  the  elements 
of  the  sequences  in  its  r.h.s.,  then  by  (61)-(63),  the  Mann- Wald  Theorem,  and  (14)  it  is  concluded  that 


,(u,v) 


Sxy'  a.s.  as  N 


oo. 


□ 


G.  The  empirical  MTCCA  procedure  with  the  exponential  and  Gaussian  MT-functions 


Given  N  i.i.d.  samples  of  X  and  Y,  the  empirical  MTCCA  procedure  with  the  exponential  and  Gaussian 
MT-functions  was  carried  out  via  the  following  steps: 

1)  Estimate  the  optimal  MT-functions  parameters  in  (17)  and  (21)  according  to 


(r,r) 


=  arg  max  ^  (  Xx '  (s,t),XY  (s,t),SXY  (s,t 
(s,t)ev 


(64) 


where  (•,•,•)  is  defined  in  (23),  and  (s,i),  XY  'U^  (s,t),  and  XXY  J  (s,t)  are  the  estimates 


in  (26)-(28)  of  the  covariance  matrices  (s,t),  (s,t),  and  Sx“yU;  (s,t),  respectively.  The 

maximization  in  (64)  was  carried  out  numerically  using  gradient  ascent  over  the  search  region  V, 
which  was  selected  as  follows: 


,(u,v) 


,(u,v) 


a)  For  the  exponential  MT-functions,  we  chose 


Ve  =  {s  €  t  G  Rq  :  JXY  (a,  t)  <  £>}  , 

where  79  =  y/2,  and  JXY  (s,t)  =  1  +  srAx  +  iTAY  +  ^sTRxs  +  srRXYt  +  ^fTRYf  is  a 
quadratic  empirical  approximation  of  the  joint  moment  generating  function  MXY(s,f)  in  (19). 
The  vectors  prx  and  p,Y  denote  the  sample  expectations  of  X  and  Y,  respectively.  The  matrices 
Rx,  Ry,  and  RXY  denote  sample  auto-correlation  matrix  of  X,  the  sample  auto-correlation  matrix 
of  Y,  and  their  cross-correlation  matrix,  respectively.  Since  D  =  \pl  and  JXY  (s,  t)  is  quadratic 
and  takes  a  unit  value  at  the  origin,  then  Ve  defines  a  closed  region  in  Wp  x  R'1  containing  the 
origin. 

b)  For  the  Gaussian  MT-functions,  the  search  region  was  set  to 


VG  =  {s  <EW,t  eRq  :  V  (Xk,  5)  <  sk  <  v  (Xk,  95) ,  u  (Yj,  5)  <  tt  <  v  {Yh  95) , 

k  =  1,  ■  •  •  ,p,l  =  1,..  •  ,q}, 
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where  Sk  and  tj  are  the  /c-th  and  Z-th  entries  of  s  and  t,  respectively,  and  u  (X,  a )  is  the  empirical 
a-th  percentile  of  the  random  variable  X.  One  can  notice  that  Vq  defines  a  closed  rectangle  in 
IRA  x  LRA.  In  the  considered  examples  it  was  verified  that  Vg  contains  the  origin.  We  note  that  in 
case  where  Vg  does  not  contain  the  origin,  one  can  always  subtract  the  expectations  of  X  and  Y 
and  perform  MTCCA  on  X'  =  X  -  E  [X;  Px]  and  Y'  =  Y  -  E  [Y;  PY\. 

2)  Obtain  estimates  of  the  MT-canonical  correlation  coefficients, 


Pk  —  Pk 


fu,v) 


and  estimates  of  the  MT-canonical  directions, 


k  =  1, . . 


r, 


ak,bk)  ,  k  =  l,...,r, 


by  solving  the  following  GEVD  equation 
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(**’**) 


s\t 


0 


a 

b 


=  P 
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(65) 

where  p  =  pf.  is  the  /c-th  largest  generalized  eigenvalue  of  the  pencil  in  (65),  and  [ar,  b7] 7  = 

^  i  T 

aT  bT 

afc  •  Dfc 


is  its  corresponding  generalized  eigenvector. 


In  all  considered  examples  the  width  parameters  a  and  r  of  the  Gaussian  MT-functions  (21)  were  set  to 

p  i 

a  =  -  &  (X/;)  and  r  =  Wcr  ( E/ ) ,  where  a  (X)  denotes  the  empirical  standard  deviation  the  random 


k= 1 

variable  X. 


i=i 


H.  Testing  the  statistical  significance  of  the  empirical  canonical  correlation  coefficients 

Let  XjV  =  {Xn}^=1  and  YN  =  {Yn}^=1  denote  sequences  of  N  i.i.d.  samples  of  X  and  Y,  respec¬ 
tively.  Additionally,  let  pr,  (XN,  YN)  denote  the  empirical  /c-th  order  canonical  correlation  coefficient 
based  on  X  v  and  Y,v.  A  bootstrap  based  procedure  for  testing  the  statistical  significance  of  the  empirical 
fc-th  order  canonical  correlation  coefficient  is  specified  below: 

1)  Repeat  the  following  procedure  for  M  times  (with  index  m  =  1, . . . ,  M): 

a)  Generate  a  randomly  permuted  version  of  the  sequence  Y^,  denoted  by  Y^. 

b)  Compute  the  statistic  0rn  =  pk  (XiV,  Y^). 

2)  Construct  an  empirical  cumulative  distribution  function  from  the  sample  statistics  9m,  m  =  1, . . . ,  M, 
as 

1  M 

Fe  (9)  =  Pr  (0  <  9)  =  —  Y,  l*>o  (x  =  9-  6m) , 

m— 1 
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where  1  is  an  indicator  random  variable  on  its  argument  x. 

3)  Compute  the  p-value 

Po  =  1  -  Fq  (6q)  , 

where  9q  =  pj~  ( X  v .  Yx )  is  the  true  detection  statistic. 

4)  If  po  <  a>  then  we  have  that  p/.  (XA,  YjV)  is  significant  at  level  a,  leading  to  rejection  of  the 
null-hypothesis  of  no  dependence  between  X  and  Y. 

In  all  considered  examples,  the  number  of  permutations  M  and  the  significance  level  a  were  set  to  1000 
and  0.01,  respectively. 
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